Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve histogram consistency #161

Merged
merged 1 commit into from
Aug 24, 2023

Conversation

dolik-rce
Copy link
Contributor

Hi, I have noticed, that some of our tooling has problems with histograms scraped from nginx.

The problem seems to be, that one worker processes collect() call, while another one is in the middle of updating histogram counters. This might result in situation, where the data contain higher count for some le values then for le="+Inf".

Such inconsistency breaks all kinds of assumption that various tools might have. For us it has manifested by prometheus thinking that some counters produced by recorded rules were restarted (because {le="+Inf"} - {le="0.1"} was negative), which resulted in huge jumps in the metrics.

The proposed fix is rather simple, just set the infinity value before other buckets (which are already incremented in descending order). It actually still allows to return inconsistent values, but at least the common assumption of non-decreasing bucket values can't be broken.

Proper fix would probably require locking, as there is AFAIK no way to atomically increment multiple values in the shared dict.

@knyar knyar merged commit 0d790d0 into knyar:main Aug 24, 2023
3 checks passed
@knyar
Copy link
Owner

knyar commented Aug 24, 2023

Thank you! Do you want me to cut a new release that will include this change?

@dolik-rce
Copy link
Contributor Author

We have found a workaround that avoids using histograms, which solves the problem completely. So we are not in a hurry, release can wait until there is something more important.

BTW: It might be good idea to mention this problem with histograms atomicity in the documentation...

@dolik-rce dolik-rce deleted the improve-histogram-consistency branch August 24, 2023 13:38
knyar added a commit that referenced this pull request May 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants